st 1
Doubly Outlier-Robust Online Infinite Hidden Markov Model
Yiu, Horace, Sánchez-Betancourt, Leandro, Cartea, Álvaro, Duran-Martin, Gerardo
We derive a robust update rule for the online infinite hidden Markov model (iHMM) for when the streaming data contains outliers and the model is misspecified. Leveraging recent advances in generalised Bayesian inference, we define robustness via the posterior influence function (PIF), and provide conditions under which the online iHMM has bounded PIF. Imposing robustness inevitably induces an adaptation lag for regime switching. Our method, which is called Batched Robust iHMM (BR-iHMM), balances adaptivity and robustness with two additional tunable parameters. Across limit order book data, hourly electricity demand, and a synthetic high-dimensional linear system, BR-iHMM reduces one-step-ahead forecasting error by up to 67% relative to competing online Bayesian methods. Together with theoretical guarantees of bounded PIF, our results highlight the practicality of our approach for both forecasting and interpretable online learning.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom (0.04)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.67)
- Energy > Power Industry (0.34)
- Education > Educational Setting > Online (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Breaking the $O(\sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(\sqrt{T})$ Static Regret in Constrained Online Convex Optimization
Balasundaram, Haricharan, Mahendran, Karthick Krishna, Vaze, Rahul
The problem of constrained online convex optimization is considered, where at each round, once a learner commits to an action $x_t \in \mathcal{X} \subset \mathbb{R}^d$, a convex loss function $f_t$ and a convex constraint function $g_t$ that drives the constraint $g_t(x)\le 0$ are revealed. The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV) compared to the benchmark that knows the loss functions and constraint functions $f_t$ and $g_t$ for all $t$ ahead of time, and chooses a static optimal action that is feasible with respect to all $g_t(x)\le 0$. In recent prior work Sinha and Vaze [2024], algorithms with simultaneous regret of $O(\sqrt{T})$ and CCV of $O(\sqrt{T})$ or (CCV of $O(1)$ in specific cases Vaze and Sinha [2025], e.g. when $d=1$) have been proposed. It is widely believed that CCV is $Ω(\sqrt{T})$ for all algorithms that ensure that regret is $O(\sqrt{T})$ with the worst case input for any $d\ge 2$. In this paper, we refute this and show that the algorithm of Vaze and Sinha [2025] simultaneously achieves regret of $O(\sqrt{T})$ regret and CCV of $O(T^{1/3})$ when $d=2$.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
e7663e974c4ee7a2b475a4775201ce1f-Supplemental-Conference.pdf
The key challenge in making this connection is grounding the skills, so that each skill corresponds to a specific goal-conditioned policy. We start by recalling the definition of the discounted state occupancymeasure(Eq.3): p(st+=sg)=(1 γ) X On the second line, we havechanged the bounds of the summation to start at 0, and changed the terms inside the summation accordingly. On the third line, we applied linearity of expectation to movethesummation insidetheexpectation. Onthefourthline,weappliedlinearity ofexpectation again to move the term fort = 0 inside the expectation. Finally, we substituted the definition of rg(s,a)toobtainthedesiredresult. This result means that we are doing policy improvement with approximate Q-values.
- Workflow (0.46)
- Research Report (0.46)
- Asia > China > Hong Kong (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
MinglingForesightwithImagination: Model-Based CooperativeMulti-AgentReinforcementLearning
Thispaperproposes animplicit model-based multi-agent reinforcement learning method based onvalue decomposition methods. Under this method, agents can interact with thelearned virtual environment and evaluate thecurrent state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied toanymulti-agent value decomposition method.
- North America > Canada > Quebec > Montreal (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- North America > United States > California (0.04)
- (9 more...)